Animated species distribution maps with {gifski}

Mapping the distribution of a species is a useful way to understand how environmental conditions influence its habitat range. One useful way to see changes over time is by using animation to view multiple distributions in succession. Here we will model the distribution of Nudibranchia across Australia each month to create an animated GIF of its distribution over a year.

Eukaryota
Animalia
Mollusca
Maps
Authors

Stephanie Woolley

Olivia Torresan

Dax Kellie

Published

February 26, 2022

Author

Stephanie Woolley
Olivia Torresan
Dax Kellie

Date

26 February 2023

Species distribution models (SDMs) are used to predict the habitat range of different organisms. They offer some of the most informative ways to show how shifting environmental conditions affect where species occur in an area.

Yet species rarely stay in the same spot for long periods of time. Just like us, they react to changes in their environment, interactions with other species, and interactions with other individuals. As a result, it can be very useful to see how a distribution of a species changes in space and over time. In marine environments, for example, seemingly small changes in temperature, chemicals and light can result in large changes to a species’ distribution.

Here we will use a series of SDMs to predict the distribution of Nudibranchia around Australia each month to create an animated map that shows how nudibranch distributions change over the year.

This post is inspired by Liam Bailey’s cool (and hilarious) Bigfoot distribution map. You can find his code here.

A brief intro to species distribution models

At their simplest, SDMs are statistical models that use information of where a species has occurred to predict where it occurs over a broader area, even in places it hasn’t been seen yet!

SDMs are particularly informative because they can use information about the environment, added as predictor variables, to help calculate where a species is most likely might live. Common environmental predictor variables include temperature, rainfall, chemical concentrations and even locations of other organisms.

This means that SDMs are usually given 2 main inputs:

  1. Occurrence data
  2. Environmental variables

Using these inputs, our species distribution model can infer the probability of where a species occurs by combining information about where and under what environmental conditions a species is observed (or not observed) in our data.

Download data

Occurrence data

Let’s first download observations of Nudibranchia across Australia.

We’ll load the necessary packages.

library(galah)
library(tidyverse)
library(glue)
library(lubridate)
library(stars)         # Raster management 
library(ozmaps)        # Australian map
library(SSDM)          # Linear modelling
library(sdmpredictors) # Environmental variables 
library(grDevices)     # Colours and fonts
library(maps)          # Cities for map
library(tmaptools)     # Create plot ratio
library(gifski)        # Create GIF
library(knitr)         # View GIF

Now we will use the {galah} package to download observations of Nudibranchia.

You will need to provide a registered email with the ALA to galah_config() before retrieving records.

# Add registered email (register at ala.org.au)
galah_config(email = "your-email@email.com")
# Download observations
nudibranch_occurrences <- 
  galah_call() |>                               
  galah_identify("Nudibranchia") |>   
  galah_filter(country == "Australia") |>
  galah_apply_profile(ALA) |> # ALA's set of data cleaning filters
  atlas_occurrences() 

Environmental Variables

Now we will download our environmental variables for our model.

For our Nudibranchia model, we will use 4 common marine environmental variables:

  • Sea surface temperature
  • Sea surface salinity
  • Distance to shore
  • Bathemetry

To get them, we’ll use load_layers() from the {sdmpredictors} package to download our variables as raster layers (geographic layers that have a value per pixel of our variable). We’ll use the rasterstack argument to combine our layers into one object.

Note

The {sdmpredictors} package has lots of data sets and layers available. Check out their website to learn more.

# Download variables
env <- load_layers(layercodes = c("MS_biogeo08_sss_mean_5m", 
                                  "MS_biogeo13_sst_mean_5m", 
                                  "MS_biogeo05_dist_shore_5m", 
                                  "MS_bathy_5m"), 
                   equalarea = FALSE, 
                   rasterstack = TRUE)

To prepare variable data for our model, we need to crop the geographical boundaries of our data to include only the coast (and surrounding ocean) of Australia. With the help of the {raster} package, we’ll use extent() to set the outer boundaries and crop() to remove the land.

# Create extent
aus_ext <- raster::extent(100, 165, -45, -10)

# Limit environmental variables
aus_env <- raster::crop(env, aus_ext) 

# Check variables 
plot(aus_env)

Prepare data

To construct our animated GIF, we can make each “frame” of our animation a species distribution map of each month - that means 12 maps, January to December.

In order to do this, we’ll create a set of custom functions that can:

  1. Filter our nudibranch observations into a smaller data.frame containing observations for a specific month,
  2. Run an SDMs on data in our smaller data.frame
  3. Plot the SDM results onto a map
  4. Save the map with a nice name

By making custom functions for these tasks, we’ll be able to run each function in a loop so we can do each thing 12 times for each of our 12 months.

At the end, we’ll stitch our 12 maps together and, Voila! We’ll officially be animators (Pixar here we come!).

First we’ll need to make a few data wrangling steps to clean and organise our data:

  • filter out NA values and duplicates (which might cause our model to error)
  • extract month of observation into its own column with the month() function
  • select month, latitude, longitude columns of each observation
# Clean, filter and convert time series to months 
occurrences_clean <- 
  nudibranch_occurrences |> 
  filter(!is.na(decimalLatitude) & !is.na(decimalLongitude)) |>
  filter(!duplicated(decimalLatitude) & !duplicated(decimalLongitude)) |>
  mutate(month = month(eventDate)) |>
  select(month, 
         latitude = decimalLatitude, 
         longitude = decimalLongitude) 

head(occurrences_clean)
# A tibble: 6 × 3
  month latitude longitude
  <dbl>    <dbl>     <dbl>
1    12    -55.1     159. 
2     3    -54.5     159. 
3    NA    -52.4      71.9
4     1    -45.1     146. 
5     1    -44.3     147. 
6     1    -44.3     147. 

From here, let’s create 12 separate data.frames containing observations for each month.

To help, we’ll make our own function make_months_df() that filters our observations in occurrences_clean to our chosen month and selects the latitude and longitude columns:

# Build function (for each month select the lat and long)
make_months_df <- function(chosen_month) {
  monthly_data <- occurrences_clean %>% 
    filter(month == {{chosen_month}}) %>%
    select(latitude, longitude)
}

Because each month is assigned as a number, with the help of purrr::map() we can run a loop over our make_months_df() function by specifying which month number we want data for.

Let’s create a vector containing numbers 1-12 (n_months) which we’ll loop over our function to return our 12 data.frames as a list (month_list).

n_months <- c(1:12)
month_list <- purrr::map(n_months, make_months_df)

month_list[[1]] # See output of month 1
# A tibble: 1,558 × 2
   latitude longitude
      <dbl>     <dbl>
 1    -45.1      146.
 2    -44.3      147.
 3    -44.3      147.
 4    -43.1      148.
 5    -43.1      147.
 6    -43.0      148.
 7    -42.6      148.
 8    -41.9      148.
 9    -41.2      146.
10    -41.1      146.
# … with 1,548 more rows

Species Distribution Model

Now that month_list contains our 12 data.frames, we’re ready to make our SDM.

To build our overall SDM we will run 3 different statistical models. Each model type makes different assumptions about the data and its variation, and without getting bogged down in statistics, this just means that each model will calculate slightly different probabilities of where nudibranchs occur because of these different assumptions.

We’ll then merge the 3 probabilities into one final value using Fisher’s combined probability. Merging multiple model results will help to balance our final value between our 3 model’s results so that it is less skewed by our model choice, hopefully giving us a more conservative final estimate.

We’ll build another custom function to run these models for each data.frame in month_list. We’ll call this function run_sdm_model, and all it does is run the exact same SDM as we did earlier, but uses the chosen month for occurrences rather than the entire data set.

Warning

We’ve chosen to use the model used by Liam Bailey in his Bigfooot map. It’s a fairly flexible model that suits our purposes to quickly see where nudibranchs are observed around Australia. But it is by no means the most robust SDM (and would not hold up through a more rigourous peer-review process). If you are trying to make a more informative species distribution model, you should consider another method!

# Create a function that calculates the chi squared value 
Chi_sq <- function(x){     
  1 - pchisq(q = x, df = 6)
}

# Species distribution model function
run_sdm_model <- function(chosen_month) {
 
    SDM_GLM <- modelling("GLM",
                     Occurrences = (chosen_month),
                     Env = aus_env,
                     Xcol = 'longitude', 
                     Ycol = 'latitude', 
                     verbose = FALSE) 
  
     SDM_MARS <- modelling("MARS",
                      Occurrences = (chosen_month),
                      Env = aus_env,
                      Xcol = 'longitude', 
                      Ycol = 'latitude', 
                      verbose = FALSE)

     SDM_CTA <- modelling("CTA",
                     Occurrences = (chosen_month),
                     Env = aus_env,
                     Xcol = 'longitude', 
                     Ycol = 'latitude', 
                     verbose = FALSE)

# Combine these to one value using Fisher's combined probability
combined <- -2 * (log(SDM_MARS@projection) + log(SDM_GLM@projection) + log(SDM_CTA@projection))

# Apply the function to our data
combined_pval <- raster::calc(combined, fun = Chi_sq) 

# Convert to stars as it is easier to use with ggplot
species_distribution <- stars::st_as_stars(combined_pval) 

return(species_distribution)

}

Now we can use purrr::map() to run another loop to return results of our 12 SDMs.

# Run & save models
model_list <- purrr::map(month_list, run_sdm_model) 

Map

We now have the results of our 12 SDMs in model_list. We can use these results to make 12 maps.

To help orient ourselves, let’s download point data of the main cities in Australia to add to our maps.

# Download all cities in Australia
world_cities <- world.cities[world.cities$country.etc == "Australia",]

# Filter to the specific cities for the east and west coast
cities <- world_cities %>% 
  filter(name == "Sydney" | name == "Melbourne" | name == "Perth" | 
           name == "Brisbane" | name == "Cairns" | name == "Canberra" |
           name == "Adelaide" | name == "Melbourne")

Let’s also choose a colour palette for our map. To make our map easier to interpret, we’ve elected to make lower values a dark blue shade (so they look like the ocean) and higher values a brighter yellow:

blue_yellow <- c( "#184E77", "#1E6091",  "#168AAD",  "#34A0A4",  "#52B69A", 
                  "#76C893", "#99D98C", "#B5E48C",  "#D9ED92")

colour_palette <- colorRampPalette(blue_yellow)(50)
feathers::print_pal(colour_palette)

Now we are ready to make maps of our results!

We’ll once again make a custom function make_the_map() to construct each map. This function contains a few things to suit our needs:

  • Plots our SDM results with geom_stars()
  • Adds each month’s 3-letter abbreviation month_label to the top of each map
  • Limits the map to the east coast (there appears to more nudibranchs there, and zooming in will help us see the detail)
  • Adds our nice colour palette with scale_fill_gradientn() & a formats a custom legend with guide_colourbar()
  • Adds cities to our map

Let’s make some month labels:

# Get label for each month
month_label <- month(1:12, label = TRUE)
month_label
 [1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

And now we’ll create our make_the_map() function:

# Map making function (for monthly SDM build this map)
make_the_map <- function(model_data, month_label) {

  month <- {{month_label}}
  
  ggplot() +
    geom_stars(data = model_data) +
    geom_sf(
      data = ozmaps::ozmap_states, 
      colour = "#A9A793", 
      fill = "#C8C6AF") +
    scale_fill_gradientn(
      colours = c(colour_palette),
      limits = c(0, 1),
      guide = guide_colourbar(
        title = "Occurrence\nprobability",
        title.theme = element_text(
          family = "Times New Roman",
          colour = "#3D4040",
          size = 10,
          face = "bold"
        ),
        label.theme = element_text(
          colour = "#3D4040",
          size = 8
        ),
        ticks = FALSE,
        frame.colour = "#3D4040",
        title.position = "top",
        title.vjust = 2,
        label.position = "left"
      ),
      breaks = c(0, 0.5, 1),
      labels = c("0%", "50%", "100%")
      ) +
    # Title map with month
    labs(
      title = glue("{month_label}")) + 
    theme_void() +
    theme(
      legend.position = c(1.2, 0.2), # reposition legend
      legend.direction = "vertical",
      plot.background = element_rect(fill = "#FFFFFF", colour = NA),
      plot.margin = unit(c(0.01, 2.5, 0.1, 0.1), "cm")) +
    # Zoom into East Coast
    coord_sf(
      crs = "WGS84",
      xlim = c(135, 159),
      ylim = c(-42.9, -11.5)
      ) +
    # Add city points
    geom_point(
      data = cities, 
      aes(x = long, y = lat), 
      colour = "#3D4040") +
    # Add city labels
    geom_text(
      data = cities, 
      aes(x = long, y = lat, label = name), 
      colour = "#3D4040",
      nudge_x = -1.8, 
      nudge_y = 0.6, 
      family = "Times New Roman")

}

We’ll also create custom save function that makes sure our maps are all saved as the same dimensions, gives them a file name we can use to order our maps correctly later, and saves them as a png in a new folder called maps.

To figure out the best aspect ratio, we’ll use the get_asp_ratio() function from the {tmaptools} package, and use that ratio to calculate the width of our map.

# Build a ratio based off of our distribution results
plot_ratio <- get_asp_ratio(model_list[[1]])

# Define letter to save map as (lets us import them in correct order for gif)
letter_id <- as.list(letters[1:12])

# Map-saving function
save_the_map <- function(chosen_map, id_label) {
  map_label <- {{id_label}}
  ggsave(chosen_map,
         width = plot_ratio*4, height = 5,
         file = glue("maps/map_{map_label}.png"), 
         device = "png")
}

Now we can use map2() to run our custom functions in a loop:

# Make SDM maps and save each map
model_list %>%
  purrr::map2(
    .x = .,
    .y = month_label,
    .f = make_the_map
  ) %>%
  purrr::map2(
    .x = .,
    .y = letter_id,
    .f = save_the_map
  )

We should now have 12 species distribution maps saved, and we can see them by returning all the files in our maps folder!

list.files(path = "maps/")
 [1] "map_a.png" "map_b.png" "map_c.png" "map_d.png" "map_e.png" "map_f.png"
 [7] "map_g.png" "map_h.png" "map_i.png" "map_j.png" "map_k.png" "map_l.png"

Let’s save our file names in an object so we can use them to make our GIF. We just have to use glue() to put the folder name on the front.

# Save file names
png_files <- list.files(path = "maps/")
map_paths <- glue("maps/{png_files}")

Make GIF

For context before we see our animation, let’s first look at the distribution of all nudibranchs across Australia (this uses all the same code as above, but without all the looping)

Code
## Run the 3 models on our data
SDM_GLM <- modelling("GLM",
                     Occurrences = occurrences_clean,
                     Env = aus_env,
                     Xcol = 'longitude', 
                     Ycol = 'latitude', 
                     verbose = FALSE) 

SDM_MARS <- modelling("MARS",
                      Occurrences = occurrences_clean,
                      Env = aus_env,
                      Xcol = 'longitude', 
                      Ycol = 'latitude', 
                      verbose = FALSE)

SDM_CTA <- modelling("CTA",
                     Occurrences = occurrences_clean,
                     Env = aus_env,
                     Xcol = 'longitude', 
                     Ycol = 'latitude', 
                     verbose = FALSE)

# Combine these to one value using Fisher's combined probability
combined <- -2 * (log(SDM_MARS@projection) + log(SDM_GLM@projection) + log(SDM_CTA@projection))

# Apply the function to our data
combined_pval <- raster::calc(combined, fun = Chi_sq) 

# Convert to stars as it is easier to use with ggplot
species_distribution <- stars::st_as_stars(combined_pval) 


## MAP 

ggplot() +
  geom_stars(data = species_distribution) + # Plot SDM results
  geom_sf(data = ozmaps::ozmap_country, # Add Australian map
          colour = "grey", 
          fill = "#C8C6AF") + 
  coord_sf(crs = "WGS84", # Set geographical boundaries
           xlim = c(112, 154), 
           ylim = c(-43, -11)) + 
  scale_fill_gradientn(
    colours = c(colour_palette), # Use custom palette
    limits = c(0, 1),
    guide = guide_colourbar(
      title = "Occurrence probability", # title of legend
      title.theme = element_text( # style legend title
        family = "Times New Roman", 
        colour = "#B3B6B6",
        face = "bold",
        size = 12
      ),
      label.theme = element_text( # style legend text
        colour = "#B3B6B6", 
        size = 10
      ),
      ticks = FALSE,
      frame.colour = "#B3B6B6",
      title.position = "top"
    ),
    breaks = c(0, 0.5, 1),
    labels = c("0%", "50%", "100%")
  ) + 
  theme_void() +
  theme(
    legend.position = c(0.2, 0.1),
    legend.direction = "horizontal",
    legend.key.size = unit(5, "mm"),
    plot.background = element_rect(fill = "#F7F7F3", color = "#F16704"),
    panel.border = element_rect(
      color = "#FFFFFF",
      fill = NA,
      size = 2)
    )

Our map shows that there are nudibranchs along pretty much the entire coastline of Australia! Their distribution appears to spread slightly further from the coast in the northern half of Australia.

Now let’s finish our animation by sticking our 12 monthly maps together with the {gifski} package!

# Create animation
gifski(map_paths, gif_file = "SDM.gif", delay = 0.5)
knitr::include_graphics("SDM.gif")

We now have our animated GIF!

Our animation shows that nudibranchs can be observed all year long, though there are some months where you are more likely to observe nudibranchs in more places than others.

However, our animation also seems to have a few limitations:

  • It’s hard to tell what time of the year nudibranchs are more likely to occur
    Distributions predicted using more observations are less influenced by outliers. This is because our model has more information in these months to help make its predictions, which makes their results more robust. However, whether one month had more observations than another isn’t something you can easily tell from our animation (a distribution doesn’t necessarily get bigger when there are more observations, and smaller when there are fewer). The truth is, there are more observations of nudibranchs from October-January, and fewer from May-August. Without already knowing, we probably couldn’t tell this from our map, and so our map can be misleading for someone interpreting its results.

  • Some distributions are blurry
    Blurriness happens when our model is less able to make specific predictions of where nudibranchs are (i.e. greater model uncertainty). Uncertainty can grow either because there is less data or because data are more varied in a given month. Again, just from looking at our animation, it’s difficult to tell whether some months nudibranchs are distributed more widely, or whether there just wasn’t much data for that month.

In the end, our animation is a nice demonstration of how model uncertainty can make it hard to interpret results. When data are broken down into smaller and smaller groups (which often happens over the course of an entire analysis), we increase the chance of model uncertainty because there are fewer data points for our models to use. It’s good to be wary of this phenomenon so you can be careful and avoid over-interpretting your results (and can be wary if you see other people doing it elsewhere)!

Nonetheless, we hope you’ve felt the thrill of constructing your own stop-motion animation with {ggplot2} and {gifski}!

Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2023-03-07
 pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 abind         * 1.4-5   2016-07-21 [1] CRAN (R 4.2.0)
 dplyr         * 1.1.0   2023-01-29 [1] CRAN (R 4.2.2)
 forcats       * 0.5.2   2022-08-19 [1] CRAN (R 4.2.1)
 galah         * 1.5.1   2023-02-21 [1] Github (AtlasOfLivingAustralia/galah@bd43dd2)
 ggplot2       * 3.3.6   2022-05-03 [1] CRAN (R 4.2.1)
 gifski        * 1.6.6-1 2022-04-05 [1] CRAN (R 4.2.1)
 glue          * 1.6.2   2022-02-24 [1] CRAN (R 4.2.1)
 htmltools     * 0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
 knitr         * 1.40    2022-08-24 [1] CRAN (R 4.2.1)
 lubridate     * 1.8.0   2021-10-07 [1] CRAN (R 4.2.1)
 maps          * 3.4.0   2021-09-25 [1] CRAN (R 4.2.1)
 ozmaps        * 0.4.5   2021-08-03 [1] CRAN (R 4.2.1)
 purrr         * 0.3.4   2020-04-17 [1] CRAN (R 4.2.1)
 readr         * 2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
 sdmpredictors * 0.2.13  2022-09-13 [1] CRAN (R 4.2.1)
 sessioninfo   * 1.2.2   2021-12-06 [1] CRAN (R 4.2.1)
 sf            * 1.0-8   2022-07-14 [1] CRAN (R 4.2.1)
 SSDM          * 0.2.8   2020-02-28 [1] CRAN (R 4.2.1)
 stars         * 0.5-6   2022-07-21 [1] CRAN (R 4.2.1)
 stringr       * 1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
 tibble        * 3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
 tidyr         * 1.2.0   2022-02-01 [1] CRAN (R 4.2.1)
 tidyverse     * 1.3.2   2022-07-18 [1] CRAN (R 4.2.1)
 tmaptools     * 3.1-1   2021-01-19 [1] CRAN (R 4.2.2)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.2.2/library

──────────────────────────────────────────────────────────────────────────────